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Stabilization of stochastic approximation by 
step size adaptation 



Sameer KamaP 



Abstract: A scheme for stabilizing stochastic approximation iterates by 
adaptively scaling the step sizes is proposed and analyzed. This scheme leads 
to the same limiting differential equation as the original scheme and therefore 
has the same limiting behavior, while avoiding the difficulties associated with 
projection schemes. The proof technique requires only that the limiting o.d.e. 
descend a certain Lyapunov function outside an arbitrarily large bounded set. 

00 

Key words: stochastic approximation, almost sure boundedness, step size 
adaptation, limiting o.d.e. 
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1 Introduction 



Stochastic approximation was originally introduced in [15] as a scheme for 
finding zeros of a nonlinear function under noisy measurements. It has since 
become one of the main workhorses of statistical computation, signal pro- 
cessing, adaptive schemes in control engineering and artificial intelligence, 
economic models, and so on. See [I], [7], [9], [11], [13] for some recent texts 
that give an extensive account. One of the successful approaches for its con- 
vergence analysis has been the 'o.d.e. approach' of [10], [H] which treats it as 
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a noisy discretization of an ordinary differential equation (o.d.e.) with slowly 
decreasing step sizes. The convergence analysis is usually of the form: if the 
iterates remain stable, i.e., a.s. bounded, then they converge a.s. to a set pre- 
dicted by the o.d.e. analysis. Stability tests that establish a.s. boundedness 
are typically geared for specific applications and require stringent assump- 
tions on the 'drift' term. See, e.g., [Tj, [S], [IE] for some recent stability tests 
motivated by reinforcement learning applications, that crucially use resp. 
long term stability w.r.t. initial data, exact linear growth, or contraction-like 
properties for the drift. There does not seem to be a broad enough test to 
cover a reasonably generic class of stochastic approximation algorithms. 

An alternative to establishing a priori stability is to force it by suitably 
modifying the algorithm, the most popular modification being to project it 
onto a bounded set every time it exits from the same [12], [9]. This, how- 
ever, is not without its pitfalls. One major problem is that the projection 
operation can introduce spurious equilibria. Another is that the choice of the 
bounded set in question needs to be carefully done, in particular it should 
include the desired asymptotic limit (point or set) which is usually not known 
a priori. 

Motivated by this, we propose and analyze a different scheme for stabi- 
lizing the iterates, viz., an adaptation of step sizes that controls the growth 
of the iterates without affecting their asymptotic behavior. This amounts to 
scaling the step sizes appropriately when the iterates are sufficiently far away 
from the origin. In fact, one can argue that at most a finite random number 
of steps differ from the original scheme. 

Another offshoot of our analysis is that instead of requiring the o.d.e. to 
descend the Lyapunov function everywhere where the function isn't at its 
minimum, we only require it to do so outside a sphere of arbitrarily large 
radius. While this is hardly surprising, the fact does not seem to have been 
formally recorded in literature. 

2 Preliminaries 

Throughout this article we allow the letter c to denote a possibly different 
constant in different places. 



Consider the Revalued stochastic approximation iterates 

x n+ i = x n + a(n)[h(x n ) + M n+1 ], (1) 

and their 'o.d.e.' limit 

x{t) = h(x(t)). (2) 

Let W(-) : M. d — >• [0, oo) be a continuously differentiable Lyapunov function. 
We make the following assumptions regarding h(-), a(n), M ra+1 , and W(-) 

(Al) h(-) is locally Lipschitz. 
(A2) Step size assumptions. 

(ii) En a l < oo- 
(A3) Martingale difference assumptions. 

(i) (M n ) is a martingale difference sequence w.r.t. the filtration (J-" n ) 
where T n = u(x , Mi, . . . , M n ). Thus, £?[M n+ i|J-" n ] = a.s. for 
all n > 0. 

(ii) M n is square integrable for all n > and there exists a locally 
bounded and measurable function /(•) : M. d — > [0, oo) such that 

£[||M n+1 || 2 |jr n ] </(x n )a.s. 

(A4) Lyapunov function assumptions. 

(i) W(x) > for all x 6 R d and W(x) -> oo as ||a;|| -> oo. 
(ii) There exists a positive integer, say M, such that 

h(x) ■ VW(x) < whenever W(x) > M. 

We next define a generalization of the iteration scheme ([1]). First, choose 
a positive integer iV, with M < N < oo, such that there is a finite positive 
constant c^ satisfying 

^i\/( \\Hy)\\ 2 + f(y) \ ™ 

c w > 1 V sup — . (3) 



At least for finite N, assumptions (Al) and (A3)(ii) guarantee such a choice 
for cjsf. Having chosen a suitable N, choose a locally bounded measurable 
function ^(-) : IR rf — ^ IR such that 



9 (.»iV 'W>^ / l ^ ) /M ■ w 

Again, assumptions (Al) and (A3)(ii) guarantee such a choice for g(-). We 
thus have, for some suitable N, possibly infinite, the following inequality 

CnW{v) > «±ZM if „,(,) > M . (5) 

Having chosen g(-), consider the iterates {y n } generated by 

Vn+i = Un + a w (n)[/i(y„) + M n+1 ], (6) 

where 

a^in) := a(n)/g(y n ). (7) 

This is a generalization of the original iteration scheme ([1]) since the step 
size a w {n) is now an T n - measurable random step size. We note that by our 
choice 

• g(-) is a locally bounded function, and 

• g(y) > 1 for all y G R d . 

Remark 1. By choosing N large enough we can ensure g(y) = 1 for y in 
an arbitrarily large sphere around the origin. If Coo < oo, we can choose 
N = oo, in which case g(y) = 1 for all y G M. d and we recover the original 
scheme (QJj. 

Remark 2. Since g(y) > 1 for all y G M d ; it follows from assumption 
(A 2) (ii) that the random step sizes satisfy 



y^a u '(n) 2 < oo a.s. 



3 A test for stability 

Let m be an arbitrary positive integer, m > M. Define the level set 

H m ■= {x : W(x) < m}, 

and let H m denote the closure of H m . Since h(x) ■ VW(x) < whenever 
W(x) > M, we get 

W(x) := h(x) ■ VW(x) < for x e H m \H M . 

As H m \H M is a compact set, and W(-) is a continuous function, there 
must exist a negative constant c such that 

sup W(x) < c < 0. (8) 

x£H m \H M 

Fix some T > 0. Note that (Al) and (A4) ensure the well-posedness of 
the o.d.e. given by d2J) for t > 0. Let y u (t) be the o.d.e. trajectory starting 
from u. Thus, y u (t) = h(y u (t)) for t > 0, and y u (0) = u. Choose a positive 
but arbitrarily small e m satisfying 

e m < 1 A inf {\W{u) - W{v)\ : u, v e H m \H M and v = x u (T)} . 

Note that e m > is possible because of ([8]). Given e m , choose a positive but 
arbitrarily small 5 m such that: 

if u,veH m and \\u-v\\ <5 m , then \W(u) -W(v)\ < e m /2. 

Note that 5 m > is possible because W(-) is a continuous function and H m 
is a compact set. 

Remark 3. The fact that both e m and 5 m can be chosen positive but arbi- 
trarily small will prove crucial later. 

Let n > 0. Given n^uj), define n i+ i{uj) as 

f 

rij + i(u;) := inf < n > n^u) : 2_. a ^(0 > T 

I rti(uj) 



Consider the <5 m -neighbourhood of H r 



N dm (H r ' 



x : inf \\x — y\\ < S r , 

y£H m 



Note that l{y n £ N 5m (H m )} a^ (n) M n+ i is a martingale difference term. 
Since N 5m (H m ) is a bounded set, and /(•) is locally bounded, it follows 
from assumption (A3)(ii) and Remark [2] that 

^E [(||l{y n G N 6 ™ {H m )}a"{n)M n+l \\f \JF n 



< I sup /(||y||) x£V( 

y£N s ™(H m ) 

< oo a.s. 



n 



(9) 



This leads to: 

Lemma 4. Assume (A2)-(A4). For any positive integer m > M we have: 

5^l{z/n e iV 5 " 1 (# m )} a w (n)M n+1 converges a.s. 

i 

Proof. This is immediate from ([9]) and the convergence theorem for square- 
integrable martingales, Theorem 3.3.4, p. 53, of [5]. D 

From Lemma H] it follows that almost surely there exists an N(u, m) such 
that if Uq(uj) > N(u),m) then 



sup 
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^l{y n GiV^(iJ m )}a w HM nH 

n (aj) 



< 



2 exp (iTT) ' 



(10) 



Remark 5. A^ote #m£ Lemma ^guarantees the convergence of the martingale 
J2i^{' ' ' } aUJ { n )M n+ i while not saying anything about ^nO^iji). Since the 
martingale converges, there must exist an N(u, m) satisfying /[Tty) even if 
^2 n o, u (n) < oo. That J2 n aU1 ( n ) = °° a - s - nee ds a proof. In what follows, we 
give a sufficient condition for stability and show that it is also sufficient for 

Yln^i 71 ) = °° a - S - 
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Let K be the Lipschitz constant of h(-) on N 5m (H m ). Without loss of 
generality we assume that N(oj, m) is large enough that if no (a;) > N(u, m). 
Then 

6 m 



K\ sup ||%)|| J>>) 2 < 



K yeNSm {Hm) / } 



2 exp (KT) ' 



;n) 



Lemma 6. Assume (Al)-(A^). Let m be a positive integer with m > M. 
Let n (uj) satisfy n (uj) > N(u),m). Under this base condition, the following 
inductive step holds: if ni(ui) < oo and y ni (u) G H m , then 

1. y 5 (u) G N 5 ™ (H m ) a.s. for m(u) <j< n i+1 (u), 

2. n i+ i{oj) < oo a.s., and 

3. Almost surely, either 

• W (l/n (i+1) (w)) <W(y ni (u)) - e -f, or 

. y n(t+1) (u)eN^(H M ). 
In particular, in either case, y n , i+1 Ju>) G H m a.s. 

Proof. We first show by induction that yj(co) G N Srn (H m ) for rii(u)) < j < 
n i+ i{oj). By assumption, y ni (u) G H m C N 5m (H m ). Fix j in the range 
Uiioj) < j < n i+1 (u). Assume yk{w) G N 5m (H m ) for n^u) < k < j — 1. We 
need to show that y d (u) G N Sm (H m ). If 



k( sup \\h(y k )\\] (£\ 



a u {nf 



< 



S n 



2 exp (KT) ' 



and 



sup 

rii<k<j—l 



< 



2exp(KT) 



(12) 



(13) 



J2 a»M n+1 

rii(u) 

then by a standard application of the Gronwall inequality (see, e.g., Lemma 2.1 
in [7]) yj(uj) will satisfy 



i-i 



»ii(w) 



< s„ 



(14) 



From the assumption that 7ij(u;) > N(u,m) it follows that (ITUj) and ( II ip 
hold. These equations, coupled with the assumption that yfe(w) G iV 5 " 1 (if m ) 
for 71$ (u;) < fc < j '• — 1, imply (fT2|) and ( 1T3|) . which in turn imply ( 1T4|) . Since 
the o.d.e. trajectory will always be in H m if it starts there, ( 1T^|) implies 

%-(w) GiV 5m (# m ). 

Induction now proves the first claim. 

For the second claim we give a proof by contradiction. Consequently, 
assume that n i+ i(u) = oo. The first claim, which has already been proved, 
now gives Uj(u)) G N Sm (H m ) a.s. for 71,(0;) < j < 00. Therefore, since g(-) is 
a locally bounded function, we get sup J > ni ( aJ ) g(yj(cu)) < 00. By assumption 
(A2)(i) this gives 

y a »(j)> ggW^ =OQ 

Since 7ii+i(u;) = 00 requires X/fln-M a "0) — ^ 1 we S e ^ the required contra- 
diction. Thus rii+i(u) < 00 a.s. 

We turn to the final claim. Let z = ^»» (^™f +1)_1 ^(n)), the o.d.e. 
trajectory after time ~ T starting from y ni (u). Since the o.d.e. starts in 
H m , it remains in H m . There are two cases to consider. 

• If z G H m \H M , the definition of e m implies that W(z) < W (y ni (u)) - 
e m . Since dHJ) holds for j = 7i( i+ i)(u;), we have \\y n(i+1) (u) - z\\ < S m . 

From the definition of 5 m it follows that W ( y n u +l) ( w ) ) < W / ( ;Z ) + e m/2. 

We get W (y n(i+1) (u)j < W(y n .(u)) - e -f. In particular, y n{i+1) {u) G 
H m . 



If z e H M , then, since \\y n{i+1) (uj) - z\\ < S m , we get y n(l+1) 



w 



N S ™(H M ). Since N 5 ™(H M ) C F M+ ir and m > M+ 1/2, we get 

The proof is complete. □ 

Define the stopping times 

7f (w) := inf{n > k : W(y n (u)) < m}. 

8 



The next result establishes the fact that if W{y n {uSj) < m for infinitely many 
n, then almost surely the iterates converge to H . 

Proposition 7. Assume (Al)-(A^)- For any arbitrary m > M, ifrJT(uj) < 
oo for all k, then y n (u) — ¥ H M a.s. 

Proof. Assume t™{uj) < oo for all k. From the definition of t™ this implies 
that given any k there exists an n with n > k such that y n (uj) G H m . In 
other words, the iterates are in H m infinitely often. By Remark \5\ there 
exists an N(u,m) satisfying fflOl) . Since the iterates are in H m infinitely 
often there exists an no > N(u,m) such that y no G H m . From Lemma [6] we 
know that if rii(u) < oo then almost surely ni + i(u) < oo. By induction it 
follows that niioj) < oo a.s. for all i G Z + . Invoking Lemma M again, we 

get that either W (y n{l+1) ^)) <W(y ni (u;)) - e -f, or y n(l+1) (co) G N S >"(H M ). 
Since W(-) cannot keep decreasing by e m /2 forever, it follows that for some 
i, y nz (co) G N S ™(H M ). Note that N S ™(H M ) C H M+ ^. Consequently, if 
y ni (u) G N 5 -(H M ) then W {y ni {u)) <M + ^f and so y n(i+1) {u) G N S ™{H M ). 
It follows that the iterates y ni (u) will eventually get trapped in N Sm (H M ). 
Once the iterates y ni (u) are trapped in N Sm (H M ) C iJ M+ "2 k , the o.d.e. 
starting from y n X u ) wu ^ remain in H M+ ~z L . It follows that once the iterates 
y n X u ) are trapped in N Sm (H M ), the intermediate iterates yj(ui), n^u) < 
j < n(j + i)(u;) will get trapped in N Sm (if M+ "2 L ). Since both e m and 5 m 
were chosen arbitrarily small positive quantities (see Remark [3]), the result 
follows. □ 

Consider two statements of stability: first 

y n (u) -)• H M a.s. (15) 

and second, for every positive integer k > 0, 

y nAT M(u) -)■ H M a.s. (16) 

The next result establishes the equivalence of the two stability statements. 

Lemma 8. Under assumptions (A1)-(A4), the two stability statements (T73J] 
and UM are equivalent. 



Proof. Clearly (fT5|) implies (flBI) . For the converse, assume ffT6|) . We need to 
show that 



P 



y nAT M(u) -+ H M Vk and y n (u) -f> H 



0. 



Fix an m > M. Let u be such that y nAT M(u>) -> H M VA; and y n (co) ft H M . 
Since y n (u) ft H M , by Proposition [7] there exists a k such that t™(uj) = oo 
a.s. For this choice of k, since m > M, it follows that Tjf{oS) = oo a.s. 
Thus for this k, y nAr M(u>) — > H M reduces to y n (u) — > ff M a.s. The result 
follows. □ 

On the basis of Lemma [H we get the following test for stability: for every 
k, if y hr u{u) — > H M a.s. then sup ra ||y n || < oo a.s. Note that it does not 
require the o.d.e. to descend the Lyapunov function inside the arbitrarily 
large set H M . In the next section we give a sufficient condition for this 
stability test. 



4 A sufficient condition for stability 

In this section we show that assumption (A5) below is sufficient for stability. 

(A5) Let the W(-) of (A4) be twice continuously differentiate such that all 
second order derivatives of W(-) are bounded in absolute value by a 
constant. 

We start with a few lemmas. 

Lemma 9. Assume (A1)-(A5). For any positive integer k, and for any 
J-\~-measurable set A, ifE,[W(yk{oj);A)} < oo then 



supE 

n>k 



W[y nAT M{oj));A 



< oo. 



Proof. We have 



2/(n+l)Arf = 2/nArf + ^(n)I{r fe M > n} h (y nAr M J 



+ M, 



n+l 



Doing a Taylor expansion and using the fact that the second order space 
derivatives of W(-) are bounded, we get 



W (3/(„ + i) At m ^ 
< W (y nAT M) + a»I{rf > n}VW (y nAT> 

+ca w (n) 2 /{r fc M > n} jh {v n ^jf) + M n+1 



h y 



InAr" 



+ M n+1 
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Since I{if > n}VW [ y nAr u )-h( y nAr M ) < and E h [y 
0, we get 



'nAT, J ; 



Mi+lKu 



E 



IT 



W (^(n+l)Arf 

< w(y nAr M^+ca"{n?lZ 



H^k 1 > n} (\\h (y nAT M^j 



\M n+ i\ 



J~n 



From (jSJ) and the definition of a w (n), it follows that 



E 



W [y {n+1)AT MJ \T n < W (ynATJf) + ca ( n ) 2 ■ c nW (y nATj 

< (l + ca(n) 2 )w(y nAT M 

< exp (ca(nf)W [y nAr M 
For n > k, integrating gives 



E 



Wly { 



(n+l)Arr / i 



.4 



<exp [c^a^ 2 )E[W{y k {u);A)] < oo. 



i=k 



The result follows. 



D 



The next lemma is independent of assumption (A5) and requires only 
assumptions (A1)-(A4) for its proof. 

Lemma 10. Assume (A1)-(A4)- Let k be an arbitrary positive integer. 
Let A be an arbitrary ^-measurable set. If 



supE 

n>k 



W(y nAT M(u))-A 



< oo 



then 



F\Af](y nAT M(uj)^H M 



0. 



Proof. Assume y nAr M(u>) ft H M . Clearly, this implies that rjf (u) = oo 
and so y nAr M(u>) = y n (u)). It follows that y n (co) ft H M . Now, let u be an 
arbitrary integer, u > M. By Proposition [7J since y n (oj) ft H M , there exists 
an integer I, I > k, such that t"(w) = oo a.s. It follows that 



A f| (y nAT M{u) ft H M ) } = |J [A f| (y nAT M(u) ft H M and r«( 



u = oo 
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Since {r"(w) = 00} C {r^ 1 (w) = 00} for all I, it follows that there exists a 
positive integer L > k such that 

P [Af| (y nAT M(oj) /> H M and r^(w) = 00)] > ixP [^f| (y^u) A H A 
Combining everything we get the following inequalities 



supE 

n>k 



W(y nAT M(u));A 
W{y LA Mw));A 



> E 

> u x P Up (r fe M (w) = 00 and r£(w) = 00) 

> u x P [A f| (y nAr M(a;) A # M and r^(w) = oc/ 

> ^xF[Af](y nAT M(uj)^H M ) ". 

Since w is arbitrary and sup n>fc E W(y nAT M(u>))] A < 00, the result follows. 

D 

Lemma 11. Assume (A1)-(A5). For k an arbitrary positive integer, we 
have 



P 



Z/nAr^M ~fr H 



M 



0. 



Proof. Define A 1 := {u : W(yk(oj)) < I}- Clearly A 1 is J-fc-measurable and 
^W (?/Jfc(w)) ]A l ] < I < 00. It follows from Lemma [S] that 



supE 

n>fc 

Lemma [10] now gives us 



W[y nAT M{u));A 



pknk«H^ r 



< 00. 



0. 



Z/nAr^M 7^# 



71/ 







Since P [(J, A'] = 1 it follows that P 

We now give our main results and a couple of examples. 
Theorem 12. Under assumptions (A1)-(A5), 

y n (u) -> # M o.«. 
In particular, sup n ||y n || < 00 a.s. 



D 
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Proof. The result follows from Lemma E] and Lemma [TTJ □ 

The next results establish that the iterates (y n ) indeed capture bahaviour 
as time goes to infinity. 

Proposition 13. Under assumptions (A1)-(A5), almost surely a w (n) = 
a(n) for all except finitely many n. In particular, 



Na w (n) = oo 



a.s. 



Proof. By Theorem [T2| y n {oj) — > H a.s. Since g(y) = 1 for y G H , and 
N > M, it follows that g(y n ) = 1 for all except finitely many n. □ 

Finally, following [3] (see also [7], Chapter 2), we get 

Theorem 14. Under assumptions (A1)-(A5), the iterates (y n ) converge 
a.s. to an internally chain transitive set of the o.d.e. 

We also get a condition for the convergence of the iterates (x n ) obtained 
by the original iteration scheme as given by ([T]). 

Theorem 15. Under assumptions (A1)-(A5), if 

\\h(x)\\ 2 + f(x) 

SUp \ ''' Y = Coo < OO 

xmd 1 A W{x) 

then the original iterates (x n ) converge a.s. to an internally chain transitive 
set of the o.d.e. 

Proof. By Remark [T] we can set N = oo in Q. Now (jl]) gives g(x) = 1 for all 
x G M d . Equation ([5]) continues to hold with Coo in place of cn- The choice 
of g(-) gives a u {n) = a{n) for all n, or x n {u) = y n (u) for all n. The result 
now follows from Theorem [Ml □ 



Example 16. Consider the scalar iteration 

x n +i = x n - a(n)x n exp (|x n |)(l + £n+i), 

where {^ n } are i.i.d. iV(0, 1) (say). Here W(x) = x 2 and g(x) = 0(exp (|x|)) 
will do. 
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Example 17. Consider the scalar iteration (QJ) with bounded h(-) satisfying 

limh(x) = — lim h(x) = — 1, 

arfoo x^— oo 

with {M n } i.i.d. uniform on [— 1, 1]. T/ien W(x) = x 2 and g(x) = 1 m/I do. 
In particular, there is no need to adaptively scale the step sizes. 

Note that neither of these two examples, even the apparently simple Ex- 
ample [T7J is covered by the tests of [1] , [8] , [16] . 

Acknowledgements: The author would like to thank Prof. V. S. Borkar 
for suggesting this problem and for his comments on an earlier draft which 
included, in particular, the idea of using an adaptive scheme. 
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